15 research outputs found

    Augmenting automatic speech recognition and search models for spoken content retrieval

    Get PDF
    Spoken content retrieval (SCR) is a process to provide a user with spoken documents in which the user is potentially interested. Unlike textual documents, searching through speech is not trivial due to its representation. Generally, automatic speech recognition (ASR) is used to transcribe spoken content such as user-generated videos and podcast episodes into transcripts before search operations are performed. Despite recent improvements in ASR, transcription errors can still be present in automatic transcripts. This is in particular when ASR is applied to out-of-domain data or speech with background noise. This thesis explores improvement of ASR systems and search models for enhanced SCR on user-generated spoken content. There are three topics explored in this thesis. Firstly, the use of multimodal signals for ASR is investigated. This is motivated to integrate background contexts of spoken content into ASR. Integration of visual signals and document metadata into ASR is hypothesised to produce transcripts more aligned to background contexts of speech. Secondly, the use of semi-supervised training and content genre information from metadata are exploited for ASR. This approach is motivated to mitigate the transcription errors caused by recognition of out-of-domain speech. Thirdly, the use of neural models and the model extension using N-best ASR transcripts are investigated. Using ASR N-best transcripts instead of 1-best for search models is motivated because "key terms" missed in 1-best can be present in the N-best transcripts. A series of experiments are conducted to examine those approaches to improvement of ASR systems and search models. The findings suggest that semi-supervised training bring practical improvement of ASR systems for SCR and the use of neural ranking models in particular with N-best transcripts improve the result of known-item search over the baseline BM25 model

    Eyes and ears together: new task for multimodal spoken content analysis

    Get PDF
    Human speech processing is often a multimodal process combining audio and visual processing. Eyes and Ears Together proposes two benchmark multimodal speech processing tasks: (1) multimodal automatic speech recognition (ASR) and (2) multimodal co-reference resolution on the spoken multimedia. These tasks are motivated by our desire to address the difficulties of ASR for multimedia spoken content. We review prior work on the integration of multimodal signals into speech processing for multimedia data, introduce a multimedia dataset for our proposed tasks, and outline these tasks

    Stacked Denoising Autoencoder for the Front-end of DNN-based Speech Synthesis

    No full text

    Similarity-Based Heterogeneous Neurons in the Context of General Observational Models

    Get PDF
    This paper presents a framework for processing heterogeneous information based on the construction of general observational domains, and similarity-based function calculi suitable for data mining in domains which can be described by the corresponding observational models. These calculi are intuitive, simple, and sufficiently general for classification and pattern recognition tasks. Functions in these calculi are represented by a particular kind of neuron models and their behavior is illustrated with examples from real-world domains showing their capabilities in processing heterogeneous, incomplete and fuzzy information.Ce document pr\ue9sente un cadre pour le traitement d'information h\ue9t\ue9rog\ue8ne \ue0 partir de la construction de domaines d'observation g\ue9n\ue9raux, et de calculs de fonctions de similarit\ue9 convenables pour l'extraction de donn\ue9es dans des domaines qui peuvent \ueatre d\ue9crits par les mod\ue8les d'observation correspondants. Ces calculs sont intuitifs, simples et assez g\ue9n\ue9raux pour des t\ue2ches de classification et de reconnaissance des formes. Dans ces calculs, les fonctions sont repr\ue9sent\ue9es par un genre particulier de mod\ue8les de neurones et leur comportement est illustr\ue9 par des exemples tir\ue9s de domaines du monde r\ue9el, qui montrent leurs capacit\ue9s de traitement d'informations h\ue9t\ue9rog\ue8nes, incompl\ue8tes et floues.NRC publication: Ye

    Eyes and ears together: new task for multimodal spoken content analysis

    No full text
    Human speech processing is often a multimodal process combining audio and visual processing. Eyes and Ears Together proposes two benchmark multimodal speech processing tasks: (1) multimodal automatic speech recognition (ASR) and (2) multimodal co-reference resolution on the spoken multimedia. These tasks are motivated by our desire to address the difficulties of ASR for multimedia spoken content. We review prior work on the integration of multimodal signals into speech processing for multimedia data, introduce a multimedia dataset for our proposed tasks, and outline these tasks

    Infected lung bulla caused by Neisseria elongata: A case report

    No full text
    Neisseria elongata is a rod-shaped, Gram-negative, aerobic bacterium that is part of the normal oral bacterial flora. Although previously considered a non- or low-pathogenic organism, the development of bacterial detection methods has resulted in increased reports of N. elongata infections such that it has recently been recognized as a causative agent of serious infections even in non-immune-compromised patients.A 77-year-old man with rheumatoid arthritis-associated interstitial lung disease, chronic obstructive pulmonary disease, and diabetes mellitus was diagnosed with a nodule in the left lower lobe of his lung. Thoracoscopic wedge resection was performed, and pus was discharged from the specimen. Mass spectrometry of the swab culture revealed N. elongata. The patient's postoperative course was uneventful, and he was doing well without recurrence at 13 months after surgery. Since N. elongata is an oral bacterial flora, the patient consulted a local dentist, and decayed teeth were extracted.Most of the reported cases of serious N. elongata infections have described infective endocarditis. This is the first report of infected lung bulla due to N. elongata infection, which demonstrates a new pathogenicity
    corecore